PCI Express to PCI Express based low latency interconnect scheme for clustering systems

ABSTRACT

PCI-Express (PCIE) is a Bus or I/O interconnect standard typically used for communication between a PCIE enabled computing system having a root complex and a plurality of peripheral devices, where the PCIE Bus may include a PCIE switch with a single inbound PCIE port. An outbound PCIE port on the root complex of the computing system connecting to the single inbound PCIE port of the PCIE switch. The invention describes a PCIE based interconnect scheme using a PCIE network switch to enable switching and inter-connection between multiple PCIE enabled computing systems to form a cluster for data transfer using PCIE protocol. For this, an outbound port enabled for interconnection, on each of the plurality of PCIE based computing systems in the cluster connect to one of a plurality of inbound PCIE ports on the PCIE network switch, enabling communication between the PCIE based computing systems of the cluster.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/175,800 titled “PCI Express.to PCI Express based low latencyinterconnect scheme for clustering systems’ filed on Jun. 7, 2016, whichis a continuation of U.S. application Ser. No. 14/588,937 titled “PCIExpress.to PCI Express based low latency interconnect scheme forclustering systems’ filed on Jan. 3, 2015, currently U.S. Pat. No.9,519,608 which is a continuation of U.S. patent application Ser. No.13/441,883 titled “PCI Express to PCI Express based low latencyinterconnect scheme for clustering systems” filed on Apr. 8, 2012, whichwas abandoned, which is a continuation of U.S. patent application Ser.No. 11/242,463 titled “PCI Express to PCI Express based low latencyinterconnect scheme for clustering systems” filed on Oct. 4, 2005 whichissued as U.S. Pat. No. 8,189,603 on May 29, 2012, all of which have acommon inventor, and are hereby incorporated by reference for all thatthey contain.

TECHNICAL FIELD

The invention generally relates to providing high speed interconnectbetween systems within an interconnected cluster of systems.

BACKGROUND AND PRIOR ART

The need for high speed and low latency cluster interconnect scheme fordata and information transport between systems have been recognized as alimiting factor to achieving high speed operation in clustered systemsand one needing immediate attention to resolve. The growth ofinterconnected and distributed processing schemes have made it essentialthat high speed interconnect schemes be defined and established toprovide the speeds necessary to take advantage of the high speeds beingachieved by data processing systems and enable faster data sharingbetween interconnected systems.

There are today interconnect schemes that allow data transfer at highspeeds, the most common and fast interconnect scheme existing today isthe Ethernet connection allowing transport speeds from 10 MB to as highas 10 GB/sec. TCP/IP protocols used with Ethernet have high over-headwith inherent latency that make it unsuitable for some distributedapplications. Further TCP/IP protocol tends to drop data packets underhigh traffic congestion times, which require resend of the lost packetswhich cause delays in data transfer and is not acceptable for highreliability system operation. Recent developments in optical transportalso provide high speed interconnect capability. Efforts are under wayin different areas of data transport to reduce the latency of theinterconnect as this is a limitation on growth of the distributedcomputing, control and storage systems. All these require either changesin transmission protocols, re-encapsulation of data or modulation ofdata into alternate forms with associated delays increase in latenciesand associated costs.

DESCRIPTION What is Proposed

PCI Express (PCIE) has achieved a prominent place as the I/Ointerconnect standard for use inside computers, processing system andembedded systems that allow serial high speed data transfer to and fromperipheral devices. The typical PCIE provides 2.5-3.8 GB transfer rateper link (this may change as the standard and data rates change). ThePCIE standard is evolving fast, becoming faster and starting become firmand used within more and more systems. Typically each PCIE based systemhas a root complex which controls all connections and data transfers toand from connected peripheral devices through PCIE peripheral end pointsor peripheral modules. What is disclosed is the use of PCIE standardbased peripherals enabled for interconnection to similar PCIE standardbased peripheral connected directly using data links, as an interconnectbetween multiple systems, typically through one or more networkswitches. This interconnect scheme by using PCIE based protocols fordata transfer over direct physical connection links between the PCIEbased peripheral devices, (see FIG. 1), without any intermediateconversion of the transmitted data stream to other data transmissionprotocols or encapsulation of the transmitted data stream within otherdata transmission protocols, thereby reducing the latencies ofcommunication between the connected PCI based systems within thecluster. The PCIE standard based peripheral enabled for interconnectionat a peripheral end point of the system, by directly connecting usingPCIE standard based peripheral to PCIE standard based peripheral directdata link connections to the switch, provides for increase in the numberof links per connection as bandwidth needs of system interconnectionsincrease and thereby allow scaling of the band width available withinany single interconnect or the system of interconnects as required.

Some Advantages of the Proposed Connection Scheme

1. Reduced Latency of Data transfer as conversion from PCIE to otherprotocols like Ethernet are avoided during transfer.

2. The number of links per connection can scale from X1 to largernumbers X32 or even X64 as PCIE capabilities increase to cater to theconnection bandwidth needed. Minimum change in interconnect architectureis needed with increased bandwidth, enabling easy scaling with need.

3. Any speed increase in the link connection due to technology advanceis directly applicable to the interconnection scheme.

4. Standardization of the PCIE based peripheral will make componentseasily available from multiple vendors, making the implementation ofinterconnect scheme easier and cheaper.

5. The PCIE based peripheral to PCIE based peripheral links inconnections allow ease of software control and provide reliablebandwidth.

6. The use of standardized PCIE based peripheral modules enabled forinterconnection as out bound port and the use of PCI-Express enabledport on the PCI-Express based network switch for interconnection betweenPCI-Express based network switches will allow for easy expansion of thecluster as computing needs grow.

7. The PCIE links and switches are agnostic to the data transmission andcan be updated with new technology as they become available, to speed updata transfer between clustered PCI-Express enabled computing systems,(also called PCIE computing systems that are computing systems usingPCI-Express bus for peripheral component interconnection, where the PCIEbus is under the control of a root complex of a respective computingsystem) without changing the capabilities and protocols of theinterconnect scheme.

DESCRIPTION OF FIGURES

FIG. 1 Typical Interconnected (multi-system) cluster (shown with eightsystems connected in a star architecture using direct connected datalinks between PCIE standard based peripheral to PCIE standard basedperipheral)

FIG. 2—is a cluster using multiple interconnect modules or switches tointerconnect smaller clusters.

EXPLANATION OF NUMBERING AND LETTERING IN FIG. 1

(1) to (8): Number of Systems interconnected in FIG. 1 (9): Switchsub-system. (10): Software configuration and control input for theswitch. (1 a) to (8 a): PCI Express based peripheral module (PCIEModules) attached to systems. (1 b) to (8 b): PCI Express basedperipheral modules (PCIE Modules) at switch. (1L) to (8L): PCIE basedperipheral module to PCIE based peripheral module connections havingn-links (n-data links)

EXPLANATION OF NUMBERING AND LETTERING IN FIG. 2

(12-1) and (12-2): clusters (9-1) and (9-2): interconnect modules orswitch sub-systems. (10-1) and (10-2): Software configuration inputs(11-1) and (11-2): Switch to switch interconnect module in the cluster(11L): Switch to switch interconnection

DESCRIPTION OF INVENTION

PCI Express is a Bus or I/O interconnect standard for use inside thecomputer or embedded system enabling faster data transfers to and fromperipheral devices. The standard is still evolving but has achieved adegree of stability such that other applications can be implementedusing PCIE as basis. A PCIE based interconnect scheme to enableswitching and inter-connection between multiple PCIE enabled systemseach having its own PCIE root complex, such that the scalability of PCIEarchitecture can be applied to enable data transport between connectedsystems to form a cluster of systems, is proposed. These connectedsystems can be any computing, control, storage or embedded system. Thescalability of the interconnect will allow the cluster to grow thebandwidth between the systems as they become necessary without changingto a different connection architecture.

FIG. 1 is a typical cluster interconnect. The Multi-system cluster shownconsist of eight units or systems {(1) to (8)} that are to beinterconnected. Each system is PCI Express (PCIE) based system with aPCIE root complex for control of data transfer to and from connectedperipheral devices via PCIE peripheral modules as is standard for PCIEbased systems. Each system to be interconnected has at least a PCIEbased peripheral module {(1 a) to (8 a)} as an IO module, at theinterconnect port enabled for system interconnection, with n-links builtinto or attached to the system. (9) is an interconnect module or aswitch sub-system, which has number of PCIE based connection modulesequal to or more than the number of systems to be interconnected, inthis case of FIG. 1 this number being eight {(1 b) to (8 b)}, that canbe interconnected for data transfer through the switch. A software basedcontrol input is provided to configure and/or control the operation ofthe switch and enable connections between the switch ports for transferof data. Link connections {(1L) to (8L)} attach the PCIE basedperipheral modules 1 a to 8 a, enabled for interconnection on therespective systems 1 to 8, to the on the switch with n links. The valueof n can vary depending on the connect band width required by thesystem.

When data has to be transferred between say system 1 and system 5, inthe simple case, the control is used to establish an internal linkbetween PCIE based peripheral modules 1 b and 5 b at the respectiveports of the switch. A hand shake is established between outboundcommunication enabled PCIE based peripheral module (PCIE Module) 1 a andinbound PCIE module 1 b at the switch port and outbound PCIE module 5 aon the switch port and inbound communication enabled PCIE module 5 b.This provides a through connection between the PCIE modules 1 a to 5 bthrough the switch allowing data transfer. Data can then be transferredat speed between the modules and hence between systems. In more complexcases data can also be transferred and queued in storage implemented inthe switch, at the ports and then when links are free transferred out tothe right systems at speed.

Multiple systems can be interconnected at one time to form amulti-system that allow data and information transfer and sharingthrough the switch. It is also possible to connect smaller clusterstogether to take advantage of the growth in system volume by using anavailable connection scheme that interconnects the switches that form anode of the cluster.

If need for higher bandwidth and low latency data transfers betweensystems increase, the connections can grow by increasing the number oflinks connecting the PCIE modules between the systems in the cluster andthe switch without completely changing the architecture of theinterconnect. This scalability is of great importance in retainingflexibility for growth and scaling of the cluster.

It should be understood that the system may consist of peripheraldevices, storage devices and processors and any other communicationdevices. The interconnect is agnostic to the type of device as long asthey have a PCIE module at the port to enable the connection to theswitch. This feature will reduce the cost of expanding the system bychanging the switch interconnect density alone for growth of themulti-system.

PCIE is currently being standardized and that will enable the use of theexisting PCIE modules to be used from different vendors to reduce theover all cost of the system. In addition using a standardized module inthe system as well as the switch will allow the cost of softwaredevelopment to be reduced and in the long run use available software toconfigure and run the systems.

As the expansion of the cluster in terms of number of systems,connected, bandwidth usage and control will all be cost effective, it isexpected the overall system cost can be reduced and overall performanceimproved by standardized PCIE module use with standardized softwarecontrol.

Typical connect operation may be explained with reference to two of thesystems, example system (1) and system (5). System (1) has a PCIE module(1 a) at the interconnect port and that is connected by the connectionlink or data-link or link (1L) to a PCIE module (1 b) at the IO port ofthe switch (9). System (5) is similarly connected to the switch troughthe PCIE module (5 a) at its interconnect port to the PCIE module (5 b)at the switch (9) IO port by link (5L). Each PCIE module operates fortransfer of data to and from it by standard PCI Express protocols,provided by the configuration software loaded into the PCIE modules andswitch. The switch operates by the software control and configurationloaded in through the software configuration input.

FIG. 2 is that of a multi-switch cluster. As the need tom interconnectlarger number of systems increase, it will be optimum to interconnectmultiple switches of the clusters to form a new larger cluster. Such aconnection is shown in FIG. 2. The shown connection is for two smallerclusters (12-1 and 12-2) interconnected using PCIE modules that can beconnected together using any low latency switch to switch connection(11-10 and 11-2), connected using interconnect links (11L) to providesufficient band width for the connection. The switch to switchconnection transmits and receives data and information using anysuitable protocol and the switches provide the interconnectioninternally through the software configuration loaded into them.

The following are some of the advantages of the disclosed interconnectscheme 1. Provide a low latency interconnect for the cluster. 2. Use ofPCI Express based protocols for data and information transfer within thecluster. 3. Ease of growth in bandwidth as the system requirementsincrease by increasing the number of links within the cluster. 4.Standardized PCIE component use in the cluster reduce initial cost. 5.Lower cost of growth due to standardization of hardware and software. 6.Path of expansion from a small cluster to larger clusters as need grows.7. Future proofed system architecture. 8. Any speed increase in theswitch and link connections due to technology advance is directlyapplicable to the interconnection scheme.

The circuit implementations can be any or a combination ofIntegrated-circuit, FPGA, Silicon on Chip (SOC), chip on board (COB),optical , or hybrid circuit implementations. In fact the disclosedinterconnect scheme provides advantages for low latency multi-systemcluster growth that are not available from any other source.

While the invention has been described in terms of several embodiments,those of ordinary skill in the art will recognize that the invention isnot limited to the embodiments described, but can be practiced withmodification and alteration within the spirit and scope of the appendedclaims. Multiple existing methods and methods developed using newlydeveloped technology may be used to establish the hand shake betweensystems and to improve data transfer and latency. The description isthus to be regarded as illustrative instead of limiting and capable ofusing any new technology developments in the field of communication andata transfer. There are numerous other variations to different aspectsof the invention described above, which in the interest of concisenesshave not been provided in detail. Accordingly, other embodiments arelimited only within the scope of the claims.

1. A method of interconnecting a plurality of PCI-Express basedcomputing systems, in a cluster, using a PCI-Express based networkswitch, each of the plurality of PCI-Express based computing systemscomprising a PCI-Express root complex, the method comprising: connectingan outbound PCI-Express port of each one of the PCI-Express basedcomputing systems to an inbound PCI-Express port of the PCI-Expressbased network switch, wherein said PCI-Express based network switchprovides data transfer back and forth between said PCI-Express basedcomputing systems using PCI-Express protocol; and wherein thePCI-Express based network switch comprises two or more inboundPCI-Express ports.
 2. The method of interconnecting a plurality ofPCI-Express based computing systems of claim 1, using the PCI-Expressbased network switch for data transfer, the method further comprising:connecting a first of the plurality of PCI-Express based computingsystems to a first inbound port of the PCI-Express based network switchby way of a first PCI-Express outbound port on the first PCI-Expressbased computing system; connecting a second of the plurality ofPCI-Express based computing systems to a second inbound port of thePCI-Express based network switch by way of a second PCI-Express outboundport on the second PCI-Express based computing system; wherein the firstPCI-Express outbound port of the first of the interconnected PCI-Expressbased computing systems is connected to the PCI-Express root complex ofthe first of the interconnected PCI-Express based computing systems andthe second PCI-Express outbound port of the second of the interconnectedPCI-Express based computing systems is connected to the PCI-Express rootcomplex of the second of the interconnected PCI-Express based computingsystems; transferring data to and from the first of the interconnectedPCI-Express based computing systems and the first PCI-Express inboundport of the PCI-Express switch using PCI-Express protocol; transferringdata to and from the second of said interconnected PCI-Express basedcomputing systems and the second PCI-Express inbound port of thePCI-Express switch using PCI-Express protocol; and transferring databetween the first PCI-Express inbound port on the PCI-Express basednetwork switch and the second PCI-Express inbound port on thePCI-Express based network switch, such that data transfer andcommunication is performed between the first and second of theinterconnected PCI-Express based computing systems in the cluster usingPCI-Express protocol.
 3. The method of claim 1, wherein PCI-Expressswitch further comprising a first PCI-Express expansion port, whereinthe first PCI-Express expansion port utilizes PCI-Express protocol andis enabled to connect to a second PCI-Express expansion port on a secondPCI-Express switch.
 4. The method of claim 1, wherein the outboundPCI-Express port of each one of the PCI-Express based computing systemsconnecting to the network switch is configured for systeminterconnection.
 5. A method for exchanging data between a plurality ofinterconnected PCI-Express enabled computing devices, each comprising aroot complex controlling a PCI-Express bus, forming a cluster, overPCI-Express links and PCI-Express based network switches, usingPCI-Express protocol, the method comprising: Interconnecting theplurality of PCI-Express enabled computing systems, each having at leasta PCI-Express enabled outbound port; connecting the outbound port ofeach of the plurality of PCI-Express enabled computing systems to aninbound port on at least one of the PCI-Express based network switchesusing PCI-Express links and PCI-Express protocol, thereby forming acluster of interconnected PCI-Express based processing systems; whereinthe interconnecting enables exchanging of data between theinterconnected computing devices of the cluster.
 6. The method of claim5, for exchanging data between any of a first and a second of theplurality of interconnected PCI-Express enabled computing systems usingPCI-Express protocol over PCI-Express links and PCI-Express basednetwork switches, further comprising: transferring data from the firstof the PCI-Express enabled computing system to the second of thePCI-Express enabled computing system and transferring data from thesecond of the PCI-Express enabled computing system to the first thePCI-Express enabled computing system; wherein transferring data from thefirst of the PCI-Express enabled computing systems to the second of thePCI-Express enabled computing systems comprises: a) transferring datafrom a first outbound PCI-Express port on the first of the PCI-Expressenabled computing system to a first inbound PCI-Express port on a firstPCI-Express based network switch; b) transferring data from the firstinbound PCI-Express port on the first PCI-Express based network switchto a second inbound PCI-Express port on the first PCI-Express basednetwork switch; c) c) transferring data from the second inboundPCI-Express port on the first PCI-Express based network switch to asecond outbound PCI-Express port on the second of the PCI-Expressenabled computing system; and transferring data from the second of thePCI-Express enabled computing systems to the first of the PCI-Expressenabled computing systems, comprising: d) transferring data from thesecond outbound PCI-Express port on the second of the PCI-Expressenabled computing system to the second inbound PCI-Express port on thefirst PCI-Express based network switch; e) transferring data from thesecond inbound PCI-Express port on the first PCI-Express based networkswitch to the first inbound PCI-Express port on the first PCI-Expressbased network switch; f) c) transferring data from the first inboundPCI-Express port on the first PCI-Express based network switch to thefirst outbound PCI-Express port on the first of the PCI-Express enabledcomputing system; wherein the PCI-Express based network switch comprisesa plurality of inbound ports that are enabled as PCI-express peripheralmodules for enabling connection to the outbound ports of the pluralityof PCI-Express enabled computing systems to be interconnected usingPCI-Express links.
 7. The method of claim 6, wherein all data transfersto and from the outbound PCI-Express ports of each PCI-Express enabledcomputing system are under the control of the root complex of therespective PCI-Express enabled computing system.
 8. The method of claim6, wherein each PCI-Express based network switch also includes at leasta PCI-Express-enabled expansion port comprising a PCI-Express enabledmodule for connecting to a PCI-Express-enabled port comprising aPCI-Express enabled module on another PCI-Express network switch usingPCI-Express links for exchanging data between a first cluster ofPCI-Express enabled computing systems connected to a first PCI-Expressbased network switch and a second cluster of PCI-Express enabledcomputing systems connected to a second PCI-Express based network switchusing PCI-Express protocol.
 9. The method of claim 8, wherein thecapability to exchange data over PCI-Express links using PCI-Expressprotocol, between a plurality of clusters of PCI-Express enabledcomputing systems, the plurality of clusters interconnected viaexpansion ports on PCI-Express based network switches with PCI-Expresslinks and using PCI-Express protocol forming a super cluster, enable theprocessing capability to expand as needed.
 10. The method of claim 6,wherein the PCI-Express based computing systems comprise computingelements selected from a group comprising computers, other systemsenabled for computation, sensor systems, control systems, storagesystems and embedded systems.
 11. The method of claim 6, wherein thePCI-Express based clustering comprises Integrated circuitimplementations, FPGA implementations, System on Chip implementation,and chip on board implementations of the PCI-Express based computingsystems as clusters.
 12. The method of claim 6, wherein the outboundPCI-Express port of each one of the PCI-Express based computing systemsconnecting to the network switch is configured for systeminterconnection.
 13. The method of claim 6, where in the PCI-Expressbased network switch and PCI-Express links are agnostic to the type ofdata transmission used.
 14. The method of claim 6, wherein the networkswitch implementation comprises any one or more of an Integrated-circuitimplementations, a FPGA implementation, a System on Chip (SOC)implementation, a chip on board (COB) implementation, an opticalimplementation, and a hybrid circuit implementation.
 15. The method ofclaim 6, wherein each PCI-Express switch comprises one or more outboundPCI-Express ports for communicating with inbound PCI-Express ports ofone or more PCI-Express peripheral devices.